Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Prediction of retweeting behavior for imbalanced dataset in microblogs
ZHAO Yu, SHAO Bilin, BIAN Genqing, SONG Dan
Journal of Computer Applications    2015, 35 (7): 1959-1964.   DOI: 10.11772/j.issn.1001-9081.2015.07.1959
Abstract389)      PDF (980KB)(573)       Save

Focusing on the issue that imbalanced dataset influencing the effect of prediction for retweeting behavior in microblogs, a novel predicting algorithm based on oversampling techniques and Random Forest (RF) algorithm was proposed. Firstly, the retweeting-related features, including individual information, social relationships and topic information, were defined. The key feature selection method was implemented based on information gain algorithm. Secondly, by considering the characteristics of the microblogs feature data, an improved algorithm for oversampling based on Synthetic Minority Over-sampling Technique (SMOTE) was proposed. In the course of this algorithm, the probability distribution of the original dataset was estimated based on nonparametric distribution estimation. In order to ensure a balanced number of positive examples and negative examples, an oversampling method was executed based on the improved SMOTE method, according to approximate probability distribution of the original dataset. Finally, a classifier based on random forest algorithm was trained, according to retweeting-related key features. The algorithm parameters of random forest were selected by analyzing the error estimation of Out Of Bag (OOB) data. By comparison with Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB) and RF algorithms, which were used in the analysis for microblog retweeting behavior, the overall performance of the proposed method is superior to the method based on SVM, which obtains optimal results in all the baseline methods. The recall rate and F-measure of the proposed method are improved by 8%, 5% respectively. The experimental results show that the proposed method can effectively improve the prediction accuracy of microblog retweeting behavior analysis in practical application.

Reference | Related Articles | Metrics